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Abstract 

We address the task of estimating multiple trajectories from unlabeled data. This problem arises 
in many settings, one could think of the construction of maps of transport networks from passive ob¬ 
servation of travellers, or the reconstruction of the behaviour of uncooperative vehicles from external 
observations, for example. There are two coupled problems. The first is a data association problem: 
how to map data points onto individual trajectories. The second is, given a solution to the data 
association problem, to estimate those trajectories. We construct estimators as a solution to a regu¬ 
larized variational problem (to which approximate solutions can be obtained via the simple, efficient 
and widespread fc-means method) and show that, as the number of data points, n, increases, these 
estimators exhibit stable behaviour. More precisely, we show that they converge in an appropriate 
Sobolev space in probability and with rate n _1 2 . 


1 Introduction 

Given observations from multiple moving targets we face two (coupled) problems. The first is associat¬ 
ing observations to targets: the data association problem. The second is estimating the trajectory of each 
target given the appropriate set of observations. When there is exactly one target the data association 
problem is trivial. However, when the number of targets is greater than one (even when the number of 
targets is known) the set of data association hypotheses grows combinatorially with the number of data 
points. Very quickly it becomes infeasible to check every possibility. Hence fast approximate solutions 
are needed in practice. 

In this paper we interpret the problem of estimating multiple trajectories with unknown data associ¬ 
ation (see Figure 1) in such a way that the fc-means method [32] may be applied to find a solution. As 
in [42], this is a non-standard application of the Amcans method in which we generalize the notion of 
a ‘cluster center’ to partition finite dimensional data using infinite dimensional cluster centers. In this 
paper the cluster centers are trajectories in some function space and the data are space-time observations. 

Let 0 C (H s ) k where H s is the Sobolev space of degree s (where we consider the case s > 1, see 
Section 2.1 for a precise definition). We have a data set {(£*, yi)}" =1 C [0,1] x M d and a model for the 
observation process 

Vi = rfptfiU) + G (1) 

where fj) = (//{,..., //j.) is some unknown function, e* ~ A) and t t ~ (!>r for densities A) and Ar on 
[0,1] and W 1 respectively. We assume that the index of the cluster responsible for any given observation 
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Figure 1: Unlabeled data is generated from three targets and using minimizers of (2) we can find a partitioning of 
the data set and nonparametrically estimate each trajectory using the /.-means algorithm. 


is an independent random variable with a categorical distribution of parameter vector p = (pi,... ,Pk), 
writing ip(i) ~ Cat(p) to mean VUp{i) = j ) = p r This assumptions allow us to write the density of y 
given t (and, implicitly, the cluster centres), which we denote by 0y(y\t), as 

k 

4>Y(y\t ) = ^2 'PjMv - 
3 =1 

We can summarize the stylized data generating process as follows. A cluster is selected at random: 
P(y> = j) = pj, the time and observation error are drawn independently from their respective distribu¬ 
tions, t ~ 0t, and e ~ (/>o; and we observe (t, y = plp(t) + e). 

The aim is to estimate p) = (p\, ..., p\) £ 0 from observed data {(U,yi)}™ =1 . In particular the 
data association 

<P : {1,2,... ,n} {1,2,... ,k} 

is unknown. With a single trajectory (k = 1) the problem is precisely the spline smoothing problem, 
see for example [46]. For k > 1 trajectories there is an additional data association problem coupled to 
the spline smoothing problem. We call this the smoothing-data association (SDA) problem. Although 
the estimator //" we propose is not necessarily a consistent estimator for p' (we do not show p n —>• pX) 
we do consider our estimator a natural choice. We believe it is possible to bound the asymptotic error 
linin^oo \\p n — fX\\(f 2 y, < C where C depends on the distribution of the data points, however it is 
beyond the scope of this work to show such a bound. We refer to [28, Section 4.5] for a bound of the 
type \\p°° — /X | < C, where p°° = linx )WOO //", for A>means in Hilbert spaces. 

We assume k is fixed and known. The aim of this paper is to construct a sequence of estimators //" 
of fX based upon increasing sets of observations {(t,,://?;)}” =1 and to study their asymptotic behavior as 
n —>• oc. For each n our estimate is given as a minimizer of f n : 0 —>• M defined by 

-j n k k 

fn{y) = - A \ yi ~ Vjfa) I 2 + A J2 llVVill| 2 (2) 

i=l j =1 j =1 

where | - | is the Euclidean norm on W 1 , Ay=i z j = minj^i,..., c/,.} and A is a positive constant. 
Penalizing the s th derivative ensures that the problem is well posed. Optimizing this function can be 
interpreted as seeking a hard data association: given // 6 0 each observation (tj, y t ) is associated with 
the trajectory closest to it so the corresponding data association solution is given by 

^(i) = argmin \pj(U) - yi\. 
j 1-2.A- 

As with many ill-posed inverse problems with a data association component recovering the ‘true’ values 
of the (infinite-dimensional) parameters is in general infeasible. Two approaches arc possible: to impose 
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strong parametric assumptions, reducing the problem to that of inferring a (finite-dimensional) collection 
of parameters (which will perform poorly when those assumptions arc inappropriate) or to proceed 
nonparametrically, optimising a cost function which balances the trade-off between a good fit to the data 
and regularity of the solution (which requires the precise specification of the notion of regularity). In this 
paper we pursue the second route, showing that in the large data limit the proposed estimators behave 
well. The main contribution of this paper is to establish the stability of A-means like estimators to the 
SDA problem. 

Although exact solution of the underlying optimization problem is NP-complete even in benign Eu¬ 
clidean settings [17], the computational cost of iterative numerical approximation has been shown to 
have a polynomial (smoothed) cost in certain Euclidean settings, e.g. [3], and in practice the perfor¬ 
mance is often much better than these bounds would suggest: it is accepted to be a numerically efficient 
method for obtaining approximate solutions (i.e. local minimizers). Our empirical experience is that 
this property holds also within the context considered by this paper. Our focus is upon the asymptotic 
properties of the ideal estimator and it is beyond the scope of this paper to upper bound the computa¬ 
tional complexity of the numerical iteration scheme. We do however point out that a key advantage of 
the A-means method is that it reduces the problem of solving the multiple target problem (A > 1) to the 
problem of repeatedly solving the single target problem (A = 1) which can be done efficiently with, for 
example, splines. 

There arc of course several variations of the A-means method, e.g. fuzzy C-means clustering [6] 
(a soft version of A-means closely-related to the EM algorithm [19]), A-medians clustering [8] (an L 1 
version of A-means), Minkowski metric weighted A-means [18] for which the analysis, particularly 
the convergence result in Theorem 3.1, could be easily adapted. Indeed, for bounded noise, the weak 
convergence A-medians clustering is a special case of [42] and to extend the result to unbounded noise 
one can follow the strategy given in the proof of Theorem 3.4. The strong convergence and rate of 
convergence will require a different approach as one loses differentiability when going from L 2 to L 1 . 

The choice of regularization scheme and, in particular, of A is not straightforward. For A = 1 there 
arc many results in the spline literature on the selection of A = A n and the resulting asymptotic behavior 
as n -> oo, see for example [1,11-13,29,33,37-40,43-45,47], In this case one has A n —>• 0 and can 
expect //" to converge to /T. Convergence is either with respect to a Hilbert scale, e.g. L 2 , or in the 
dual space, i.e. weak convergence. Using a Hilbert scale in effect measures the convergence in a norm 
weaker than H s . We remark that when A > 1 and A„ Owe would expect that minimizers //" converge 
to a minimizer /<* of 

n k 

A \y - Vj(t)?4>Y(y\t)(l>T{t)&y<lt. 

[d 3 =1 

In particular we do not expect that //* = /v, indeed even the A-means in Euclidean spaces is known to 
be asymptotically biased. In this paper we do not take X n 0 which adds a further bias. 

The approach we take, as is common in settings in which smooth solutions are expected, is to 
penalize the derivative. By Taylor’s Theorem we can write H s = Tio 0 TL \ where 

f f 

'Ho = span i Q(t) = — : i = 0,1,..., s - 1 

Til = {g £ H s : V*p(0) = 0 for alH = 0,1,..., s — l} . 

We use || • ||i = ||V s • \\ L 2 as the norm on 'H\ and denote the 'Ho norm by || • ||o, and therefore we use 
the norm || • ||#« = || • ||o + || • ||i on H s (which is equivalent to the usual Sobolev norm). Since Tio is 
finite dimensional we are free to use any norm we choose without changing the topology. We can view 
H s = Tio 0 H i as a multiscale decomposition of H s . The polynomial component represents a coarse 
approximation. The regularization penalizes oscillations on the fine scale, i.e. in Ti\. 



3 



In the case A = 1, /„ is quadratic and one can find an explicit representation of //', i.e. there exists a 
random function G n \ such that with probability one //' = G n .\i/' for some function u" which depends 
on the data. When k > 1 the problem is no longer convex and the methodology used in the k = 1 case 
fails. 

The first result of this paper (Theorem 3.1) is a weak convergence result, we show that there exists 
/ 1 °° € 0 such that (up to subsequences) [i n —*■ n°° a.s. in H s and fj,°° is a minimizer of defined by 

n k k 

A \y-^j(t)\ 2 dydt + A JAlvVjllis. (3) 

d j i i=i 

One should note that if /r°° = ..., //.^°) is a minimizer of /oo then so is /2°° = (/A^,..., ) 

for any permutation p : { I..... A'} —>• {1,..., A:} and therefore we do not expect uniqueness of the 
minimizer. Considering the law of large numbers the limit is natural. The functional fG can be 
seen as a limit of f n , the nature of which will be made rigorous in Section 3. The second result is to 
go from almost sure weak convergence to strong convergence in probability. In other words, we obtain 
convergence of the minimizing sequence in a stronger topology at the expense of considering a weaker 
mode of stochastic convergence. 

We recall that one motivation for considering the minimization problem (2) is to embed the problem 
into a framework that allows the application of the A -means method. Large data limits for the A-means 
have been studied extensively in finite dimensions, see for example [2,5, 10,25,31,34-36]. There 
arc fewer results for the infinite dimensional case, with [4,7, 14, 15,22,26-28,30,41,42] the only 
results known to the authors. Of these, only [42] can be applied to finite dimensional data and infinite 
dimensional cluster centers but required bounded noise and furthermore the conclusion were limited to 
weak convergence. The first contribution of this paper is to extend this convergence result to unbounded 
noise for the SDA problem (Section 3). We point out that [4,7,26,28] give results for the convergence 
and rates of convergence of the minimum min f n (in infinite dimensional settings) and [27] gives results 
for the convergence of the minimizers. 

The result of Theorem 4.1 is that, upto subsequences, the convergence is strong in H s . The final 
result is to show that the rate of convergence is of order in probability. I.e. 

= 0 „ (-L). 

This is closely related to the central limit theorem first proved for the A’-means method by Pollard [36] 
for Euclidean data. We extend his methodology to cluster centers in H s to prove our rate of convergence 
result and in doing so provide a theoretical justification for using this method in the more complex 
scenario which we consider and, in particular, for using such approaches to address post hoc tracking of 
multiple targets using A’-means type algorithms. As with Pollard's finite dimensional result we require 
an assumption on the positive definiteness of the second derivative of the limiting function 

In the next section we remind the reader of some preliminary material which underpins our main 
results. Section 3 contains the weak convergence result. In Section 4 we go from weak convergence to 
strong convergence with rates. 

2 Preliminaries 

2.1 Notation 

The Borel cr-algebra on [0,1] x W 1 is denoted 25([0,1] x M d ) and the set of probability measures on 
([0,1] x M d , £?([0,1] x M rf )) by "P([0,1] x R d ). Our main results concern sequences of data {(A, j/i)}jL 1 
sampled independently with common law P £ V(\(). 1] x M d ) which is assumed to have a Lebesgue 
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density, c/>((t,y)) = <py'(y\t)(pT(t). We work throughout on a probability space (fi, F. P) rich enough 
to support a countably infinite sequence of such observations, (/;, y l ) : fl —> [0,1] x W 1 . All random 
elements arc defined upon this common probability space and all stochastic quantifiers arc to be un¬ 
derstood as acting with respect to P unless otherwise stated. With a small abuse of notation we say 

(ti,yi) € [0,1] x R d . 

We will define the space 0 C ( H s ) k in Section 3. The Sobolev space H s is given by 

H s := j/r : [0,1] —¥ s.t. VV i s absolutely continuous for i = 0,1,. .., s — 1 and V s /r £ T 2 j . 

Note that data is of the form {(ij, yi)}f =1 C [0,1] x R d . 

We denote weak convergence by —*■: if v n , v £ H s satisfies F(v n ) —>• F{y) for all F £ (TP)* then 
u n —*■ u. A sequence of probability measures P n is said to weakly converge to P if for all bounded and 
continuous functions h we have 

P n h -X Ph. 

Where we write Ph = f h(x) P(dx). If P n weakly converges to P then we write P n =x P. 

We use the following standard definitions for rates of convergence. 

Definition 2.1. We define the following. 

(i) For deterministic sequences a n and r n , where r n are positive and real valued, we write a n = 
Oir n ) if^r is bounded. If -X- 0 as n —>• oo we write a n = o(r n ). 

(ii) For random sequences a n and r n , where r n are positive and real valued, we write a n = O p (r n ) 
if is bounded in probability: for all e > 0 there exist deterministic constants M e , N e such that 

P <e Vn > N e . 

If2p—> 0 in probability: for all e > 0 

P —>• 0 as n —>■ oo 

we write a n = o p (r n ). 

When a = a(r) can be written as a function of r we will often write a = 0(r) or a = o(r) to mean 
for any sequence r n -X 0 that a n := a(r n ) satisfies a n = 0(r n ) or a n = o(r n ) respectively. 

2.2 T-Convergence 

Our proof of convergence will use a variational approach. In particular the natural convergence for a 
sequence of minimization problems is T-convergence. The T-limit can be understood as the ‘limiting 
lower semi-continuous envelope’. It is particular useful when studying highly oscillatory functionals 
when there will often be no strong limit and the weak limit (if it exists) will average out oscillations 
and therefore change the behavior of the minimum and minimizers. See [9, 16] for an introduction to 
T-convergence and [23,24,42] for applications of T-convergence to problems in statistical inference. 
We will apply the following definition and theorem to H = 0 C ( FI s ) k . 

Definition 2.2 (T -convergence [9, Definition 1.5]). Let LI be a Banach space and 0 C LL be a weakly 
closed set. A sequence /„:0->lU {±°o} is said to T-converge on 0 to :0^RU {±co} with 
respect to weak convergence on LL, and we write /oo = T- lim n f n , if for all u £ 0 we have 
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(i) (lint inf inequality) for every sequence {v n ) C 0 weakly converging to v 


/ooO) < liminf / n (z/ n ); 

n 


(ii) (recovery sequence) there exists a sequence ( v n ) weakly converging to u such that 

fociy) > limsup f n (u n ). 

n 


When it exists the T-limit is always weakly lower semi-continuous [9, Proposition 1.31] and there¬ 
fore achieves its minimum on any weakly compact set. An important property of T-convergence is that 
it implies the convergence of minimizers. In particular, we will make use of the following result which 
can be found in [9, Theorem 1.21]. 

Theorem 2.3 (Convergence of Minimizers). Let Li be a Banach space, 0 C LL be a weakly closed set 
and f n , : 0 —>• M U {±oo} be a sequence of functionals. Assume there exists a weakly compact subset 
K C 0 with 

inf f n = inf f n Vn G N. 

e J K J 

If foo = T- lim n f n and f^ is not identically ±oo then 

min/oo = liminf f n . 

© n © 

Furthermore if pF G K minimizes f n then any weak limit point is a minimizer of f^. 


2.3 The Gateaux Derivative 

As in Section 2.2 we will apply the following to LL = 0 C ( H s ) k . 


Definition 2.4. We say that f : LL -A M is Gateaux differentiable at // G 'H in direction v G 'H if the 
limit 


9f (p\v) 


lim 

i—>o+ 


f(p + rv) - f(p) 
r 


exists. We may define second order derivatives by 


d 2 f{p;u,uj) 


lim df(p + rur,u) - df{p;v) 
r—>0+ r 


for p, a, oj G LL. In cases where the second derivative does not necessarily exist we will define d 2 _f by 


d^fip^u^uj) = liminf 
r—m+ 


df(p + ru;v)-df(p\v) 
r 


To simplify notation, we write: 

d 2 _f{p]u) := d 2 _f(p\v,v). 

Theorem 2.5. Let p, v G LL.. If f : LL -A M is continuously Gateaux differentiable on the set 
{tp + (1 — t)v : t G [0,1]} then 

f(v) > f(d) + 9f(p; v-p) + 1 “ v ~ d) 

for some t* G [0,1]. 
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Proof. The theorem is only a slight generalisation of Taylor’s theorem. Indeed, if there exists t G 
[0,1] such that dff(( 1 — t)g + to-, o — g) = —oo then we have nothing to prove. So we assume 
dff(( 1 — t)g + to-, v — g) > —oo for all t G [0,1], define g(t) = /((1 — f)/x + to) then we can show 
that g( 1) = f(o), g{ 0) = f{g), g'( 0) = df{g-, ^ - g) and p" (f) = dtf{( 1 - f)/r + to; o - g) where 


we define 


g"_ (t) = lim inf 

i—>o+ 


g'{t + r) 
r 



(4) 


Hence we can equivalently show that p(l) > p(0) + p'(0) + \(f!_{t*) for some t* G [0,1]. Define 
J = 2(p(l) — g(0) — g'( 0)) and we are left to show J > g’f(t*). 

Let 

m = git) + </'(()(! - 1) + - 9(1) 

and note that, by definition of J, F( 0) = F(l) = 0. Since F!_(t) = (1 — t)(g'f(t) — J) (where F'_ is 
defined analogously to (4)), then if we can show there exists t* € (0,1) such that F'_ (t*) < Owe are 
done. One can easily show that if F'_ (t) > 0 for all t then F is strictly increasing, which contradicts 
F( 1) = F(0), and so there must exist such at*. □ 


3 Weak Convergence 

To show weak convergence we apply Theorem 2.3. The following two subsections prove that the con¬ 
ditions required to apply this theorem, i.e. that is the T-limit of /,, and that the minimizers //' are 
uniformly bounded, hold with probability one. 

For a fixed 4 > 0 we define the set 0 to be the set of functions in (Fl s ) k which have minimum 
separation distance of 6: 

0 = |q G ( H s ) k : \gj(t) - m(t)\ > S Vf G [0,1] and j / f j . (5) 

For d = 1 this is a strong assumption as we restrict ourselves to trajectories that do not intersect. When 
considering the tracking of real objects in 2 or more dimensions, the assumption is typically physically 
reasonable. For example if gj are to represent trajectories of extended objects by modelling the location 
of the centroid, it is natural to require a minimum separation of those centroids on a scale corresponding 
to the extent of the objects in question. 

In practical implementations the constraint could be difficult to implement, but it is straightforward 
to check whether it is satisfied post hoc. For a wide range of distributions on the data it is reasonable 
to expect that any two cluster centers obtained by numerical procedures will not intersect and therefore 
have a minimum separation distance. Of course, this separation distance is only known with posterior 
knowledge and not prior knowledge as we assume here. We expect that one could improve this reasoning 
to state explicitly that with high probability any two cluster centers arc at least 6* apart for some (5* that 
depends upon the distribution of the data. We do not attempt to prove any such statement here. Such 
a statement would imply that one could carry out the classification using a /r-mcans method without 
directly imposing the constraint. 

We use the assumption in order to infer that the spatial partitioning induced by any set of cluster 
centers // G 0 is such that every element of the partition is non-empty, at every time t, i.e. the sets 

Xj(t ) = G : \x — Pj{t) | < \x — gift) \ for i j j 

for j = I, A; are all non-empty. 
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First let us show that 0 is weakly closed in ( H s ) k . Take any sequence //' G 0 such that 
(. H s ) k . We have to show /i£0. Pick t G [0,1], j ^ l and define F : 0 —>• by F : zz —>• uj (t) — oft), 
note that F is in the dual space of ( H s ) k (since s > 1). Hence 

^ < i ${t) - = \n» n )\ \fm\ = in® - W (f) |. 

Therefore // £ 0. Furthermore we can show that f n , arc weakly lower semi-continuous [42, Propo¬ 
sitions 4.8 and 4.9] hence they obtain their minimizers over weakly compact subsets of 0. We will show 
that minimizers arc contained in a bounded, and hence weakly compact set, and therefore there exists 
minimizers of f n and /oo on 0. 

We now state our assumptions. 

Assumptions. 1 . The data sequence (tj, yf) is independent and identically distributed in accordance 
with the model (1), with p' £ ( L°°) k , p(i) ~ Cat (p), e; ~ fo, f ~ <f>r: and ti are 

mutually independent, and (ip(i),ei,ti), ( <p(j),ej,tj ) are independent for i / j. We assume <j>o 
and cj)T are continuous densities with respect to the Lebesgue measure on W 1 and [0,1] respectively 
and use the same symbols to refer to these densities and to their associated measures. 


2. The density fo is centered and has finite second moments. 

3. For all e € M ' i , fo{e) > 0. 

4. There exists a < —d — 3 and c\ such that sup^g.i] 0y(y|t) < ci|y| Q . 
Observe that 


n k 


fn{ M f ) = f l ^2/\ I Mjte) “ Vi? + X Y^\\^ S b )\\ 2 L 2 
i=l j=1 j =1 

-% n k 


s ,,t 112 

'j Wl 2 


i —1 


3 =1 


n k 


p]|Il 2 


2=1 


j=i 

fc 


Var(ej) + ||V s //j||^ 2 =: cr < oo 

i=i 


where the convergence is almost surely by the strong law of large numbers. Hence Assumption 2 implies 
that there exists N such that rnin /te e /'„ (// ) < a + 1 for n > N and N < oo with probability one 
(although N could depend on the sequence { t r , y?;}” =1 and so we could have sup wg ^ N = oo). 

To simplify our proofs we use Assumption 3 although the results of this paper can be proved without 
it. The assumption is used in bounding the minimizers of f n . Clearly if </> 0 has bounded support then 
each yi is uniformly bounded (a.s.) and one can show that \p n (t) \ is bounded uniformly in n and t (a.s.). 
Assumption 3 can be relaxed at the expense of some trivial but notationally messy modifications. 

Assumption 4 is used the next section to uniformly control the decay in the density fy In particular 
the assumption allows us bound the error due to restricting to bounded sets. Although Assumption 4 
implies that <i>o has at least two moments we include the second moment condition in Assumption 2 as 
the decay in density is not needed until later sections. 

Note the second moment condition implies that fo decays as |e| —>• oo and therefore, by continuity, 
fo is bounded in L°°. 

We now state the main result for this section. The proof is an application of Theorem 2.3 once 
we have shown that is the T-limit (Theorem 3.2) and established the uniform bound on the set of 
minimizers Theorem 3.4 (which by reflexivity of the space (H s ) k implies weak compactness). 



Theorem 3.1. Define f n , : 0 —> M by (2) and (3) respectively, where 0 C {H s ) k for s>\ is given 

by (5). Under Assumptions 1-3 any sequence of minimizers p n of f n is, with probability one, weakly 
compact and any weak limit p°° is a minimizer of /oq. 

3.1 The T-Limit 

We claim the T-limit of (/„) is given by (3). 

Theorem 3.2. Define f n , f a 0 : 0 —>■ R by (2) and (3) respectively where © C (H s ) k for s > 1 is given 
by (5). Under Assumptions 1-2 

foo = T- lim f n 

n 

for almost every sequence of observations (fii, yi), fo, 2 / 2)5 • • • • 

Proof We are required to show that the two inequalities in Definition 2.2 hold with probability 1. In 
order to do this we follow [42] and consider a subset of O of full measure. O', and show that both 
statements hold for every data sequence obtained from that set. 

For clarity let P(d(t. y)) = (pY(dy\t)(pT(dt). Let P^A be the associated empirical measure arising 
from the particular elementary event oj, which we define via it’s action on any continuous bounded 
function h : [0,1] x R rf —>• R: P^h = A YA=i h (4°^’where emphasizes that 

these arc the observations associated with elementary event oj. Define fpfit, y ) = f\' j :] (y — lift)) 2 . To 
highlight the dependence of f n on ui we write frf ' 1 . We can write 

k k 

/i"V) = HVVillia and = Pg „ + ||Willi*. 

3 =1 3 = 1 


We define 


n' = {to G n : ^ p} n {a; € D : P^\B{ 0, q) c ) -> P(B{ 0, q) c ) Vq G n} 


H \ Co? G £"2 


'(SIM)" 


l»| 2 4") (<!(*■»)) 


'(S(0,5)) c 


lfl | 2 ^(d(*,y)) Vq G N 


then P(fi') = 1 by the almost sure weak convergence of the empirical measure [20] and the strong law 
of large numbers. 

Fix w G O' and we staid with the lim inf inequality. Let p n —*■ p. By Theorem 1.1 in [21] we have 


/ lim inf fly* ((*', fl')) -P( d (L y)) < lim inf 

fl[ 0 ,l]xM d n-*oo , 

By the same argument as in Proposition 4.8.ii in [42] we have 

,2 


[0,1] xR d 




i]] f , Ay' - v] (0) >{y- Pj ( t)Y 

n—>• 00, (r, y') —>■ (t, y) 


Taking the minimum over j we have 


lim inf (t', y') > gp (t, y ). 

n—» 00, (r, y ') —»■ (t , y) 

And, as norms in Banach spaces are weak lower semi-continuous, liminfn^oo ||V s /i”||^ 2 A l|V s fl,|| 

i 2 

Therefore 

lim inf f^\p n ) > /oo(fl) 
n—> 00 
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as required. 

We now establish the existence of a recovery sequence for every w G O' and every // G 0. Let 
[i n = n G 0. Let Q q be a C'°°(IR' :,+ I ) sequence of functions such that 0 < C, q (t,y ) < 1 for all 
(Ly) G M d+1 , C 9 (t,y) = 1 for (t,y) G B{0,q- 1) and ( q (t,y) = 0 for (t,y) 0 B(0,q). Then the 
function ( q (t, y)g^{t, y) is continuous for all q. We also have, for any (t, y) G [0,1] x M d , 

C q (t,y)9ij,(t,y) < C q {t,y)\y - m(t)I 2 

< 2C q(t,y) (\y\ 2 + \yi{t)\ 2 ) 

— y) (jy| + ll/^tII l°°([o,i])) 

< 2|q| 2 + 2||/ri|||ooQ 0) i]) < oo 

so CqfJii is a continuous and bounded function, hence by the weak convergence of to P we have 


P^Cq 9 „ P(q 9 fJ. 


as n —^ oo for all q £ N. For all q G N we have 

limsup \pMgp - Py M | < hmsup |~ + limsup \P^C, q g^ - P( q g qi \ 

n—>• oo n—» oo n—> oo 

+ limsup |PCg5/x 

n—> oo 

= hmsup |Pi w) y M - -P^C^I + l p C?y/i - ^1- 

n—>-oo 

Therefore, 

hmsup iP^y,, - Py M | < hmsup hmsup |Pi° J) y /i - P^Cz^l 

n^-oo g—>■ oo n—>-oo 

by the dominated convergence theorem. We now show that the right hand side of the above expression 
is equal to zero. We have 

14“% - 4 W) C«^l < 

< [ \B(o, q -i)r(t,y)\y - Mt )\ 2 ^i w) (d(f,y)) 

J [0,1] xR d 

<2 [ l ( B (0 , g - 1)) c(f,y)|y| 2 ^i w) (d(f,y)) 

J [0,1] xR d 

+ 2||yi|||oc ([01]) [ I(B( 0 ,g-i))c(f,y) p^ } (d(t,y)) 

d[0,l]xR d 

-»• 2 [ \B(o, q -i)r(t,y)\y\ 2 P(A(t,y)) 

J[ 0,1] xR d 

+ 2||yi|||oc( [011) / Ifsm.g-nv^y) ^(d(f,y)) as n -> oo 

J [0,1] xR d 

—>• 0 as q —>• oo 

where the last limit follows by the monotone convergence theorem and Assumption 2. We have shown 

lim - Py M | = 0. 

n—> oo 

Hence 

fr?\g) ioo(y) 

as required. □ 
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3.2 Boundedness 


The aim of this subsection is to show that the minimizers of /„ arc uniformly bounded in n for almost 
every sequence of observations. We divide this into two parts; bounding each of the "Ho and H\ norms. 
The H\ bound follows easily from the regularization. For the Ho bound we exploit the equivalence of 
norms on finite-dimensional vector spaces to choose a convenient norm on Ho- 

By the argument which followed the assumptions we have, for n sufficiently large and with proba¬ 
bility one, min /te e /„ (//) < a + 1 < oc. Now we let //" be a sequence of minimizers. Then there exists 
II C such that P(fi) = 1 and for all u E Cl we have 

k k 

f n (jf) = Pk u) 9 »t + HVVj-llia t + A]T ||VV ]||| 2 =: a. 

3 =1 3 =1 


Therefore for all u E Cl there exists N = N(u) such that for n > N we have 

k 

X Y^ IK 111 - /»*(K) < fn{p ] ) <a + l. 

3 = 1 

Therefore 11 //'■' |j i is bounded almost surely for each j. We arc left to show the corresponding result for 

Kilo. 

The following lemma will be used to establish the main result of this subsection. Theorem 3.4. It 
shows that, if for some sequence u n E H s with |V'V "||^2 < -/a and |K||o —>• oo, then we have that, 
up to a subsequence, \v n {t)\ —>• oo with the exception of at most finitely many t E [0,1], When applied 
to /j” this will be used to show that in the limit, if any center is unbounded, then the minimization can 
be achieved over k — 1 clusters — and hence to provide a contradiction. 

Lemma 3.3. Let v £ H s satisfy ||V s i' n ||x 2 < \Ja and |K||o ~^ °°- Then there exists a subsequence 
such that, with the exception of at most finitely many t £ [0, 1], we have |i' nm (t)| —>• oo. Furthermore 
for each t. £ (0,1) with \u n (t)\ —y oo and any t n —y t we have \v n (t n )\ —y oo. 

Proof. Let the norm on Ho be given by 

v 

By Taylor’s theorem and the bound on ||V s ^ n ||x ,2 we have 


s-l 

==£ 


o •— 


VV(0)| 


i=0 


( 6 ) 


^)-E 

i =0 


VV n (0) 


< \fa 


Now let Q n (t) = Yh =o V V i\ (0) t l and Qnif) = || Qjq • In particular ||Q n ||o = 1- Take any sub- 
sequence n m then since arc uniformly bounded equi-continuous for all i = 0, 1 ,..., s — 1 so 
by the Arzela-Ascoli theorem there exists a further subsequence (which we relabel) for which -yfF 
converges uniformly to for some Q and all i = 0,1,... s — 1. In particular d / is a constant 
and therefore Q is a polynomial of degree at most s — 1. It follows that ||0||o = 1 and therefore Q 
is not identically zero, hence Q has at most s — 1 roots. For any t that is not a root of Q we have 
\Qn m (t)\ = l<9ra m (i)|||QraJ|o oo. This implies that \v n (t)\ -> oo. 

Now pick t £ [0,1] with \i/'(t)\ —y oo and assume t n —y t. We assume that there exists a subse¬ 
quence n m such that | Qn m (fn m ) | is bounded. By going to a further subsequence (which we relabel) we 
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assume that Q Um —>• Q uniformly. Choose 5 > 0 sufficiently small then there exists e > 0 and N < oo 
such that for all s with |s — t\ < e and n m > N then 

e 

|Q(s)| > 5 , II Qn m - Q\\l°° < - and \t nm - t\ < e. 

It follows that 

£ 

\Qn„i(tn m )\ -2 I Q(frim )I — I Q(^n m ) ~ Qn m (tn m )\ > Zf ' 

In particular |Q nm (f nm )| = ||Qn||o|Qn m (in m )| > -r oo. This contradicts the assumption that 

I Q^m (trim )I i s bounded. We have shown that \u n (t n )\ —> oo. □ 

We proceed to the main result of this subsection. 

Theorem 3.4. Define f n , foo : 0 —>• M, where 0 C (H s ) k for s > 1 is given by (5), by (2) and (3) 
respectively. Let p n be a minimizer of f n then, under Assumptions 1-3, for almost every sequence of 
observations there exists a constant M < oo such that \\p n \\(H s ) k — M for all n. 

Proof. As in the proof of Theorem 3.2 we let oj € f l" where 

f 1 n 

Ll" = < uj € D! : — ^ ef —> Var(ei) 
l n i =t 

n (n c£W . \u € Si' : P<“) (b (c, 0) -> p (b (c, 0) }) 

where O' is defined in the proof of Theorem 3.2. We have P(Q ,/ ) = 1. For the remainder of the proof 
we assume uj € f l" . Then there exists A- A < oo such that fjf* ( //') < a + 1 for all n > N^\ Hence, 
for sufficiently large n, 

k 

3 =1 

It remains to show the "Ho bound. The structure of the proof is si mi lar to [27, Lemma 2.1]. We will 
argue by contradiction. In particular we argue that if a cluster center is unbounded then in the limit the 
minimum is achieved over the remaining k — 1 cluster centers. 


Step 1: The minimization is achieved over k— 1 cluster centers. We assume sup ; 11 //'■' 11o is unbounded, 
then there exists j* and a subsequence (which we relabel) such that ||//"* ||o — > oo. By Lemma 3.3 there 
exists a further subsequence (again relabelled) such that |//", (t) \ —y oo for all but finitely many t. For 
any such t, by Lemma 3.3, we have 

iim \p'j*(t')\ = oo. 

This easily implies 
for any y e W 1 . Therefore 


lim 

n—>-oo,(t' ,y')~ 


ft,y) 


\P'i 


>{t') - y' 2 = oo 


lim inf 

n—>-oo ,(t' ,y')—>(t,y) 


A h"(0 - 


P =1 


A F?(0 - yf = o- 

i/i* / 
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Note that the above expression holds for P-almost every (t, y) € [0,1] x M. d (as by Lemma 3.3 the 
collection of t for which |//”*(t) oo has Lebesgue measure zero). By Fatou’s lemma for weakly 
converging measures [21, Theorem 1.1] and the above we have 


lim inf 

n—»■ oo 



- A W(t)~y\ 2 P}?\dt,dy) 
3+3* 


> o. 


Hence 


liminf (/<"V) - - A||VV”.||i.) > 0 

where we interpret /n^((/x” )j¥j*) accordingly. So, 

liminf (/A } (F n ) - > 0. 


Step 2: The contradiction. If we can show that there exists e > 0 such that 

liminf (/<"V) - ftHWh#-)) < -«• 


(i.e. we can do strictly better by fitting k centers than fitting fc — 1 centers) then we can conclude the 
theorem. 

Now, 

^ n k 

< /F>(a") = - E A iA"(«i) - »<i 2 + a L iiv*a”&, 

*=1 f=t iAT 


where 


m) 


t) for j^j* 
c n for j = j* 


for a constant c n . By definition, the /a'' must have a minimum separation distance of 5. For now we 
assume that we can choose c n such that this criterion is fulfilled. So if \y t — c n \ < | then 


I Vi c n| H“ ^ E | Ui\ 


for all j / j*. And therefore |yi — c n \ 2 + £ < \yd(ti) — yi\ 2 which implies 


/F>((a?w) 


1 n 

-E A !f<?(‘i)-»l 2 + AEll V ‘wllh 

* =1 j ¥=3* 


11/ ^ IL 

= -E A ww- 2/i| ViiS/i)'**!.?* + ;EA K«0-Kl 2 I«„»,)~.f* 

®=1 iAt* i=1 iAi* 

+ ^ H^Vt'lli 2 

iAi* 

^ n j n 

AEA I/A? (^i) Vi I flfe.jj/iVn.i* n l C rc J/il 
*=1 AA/* i=1 

+ |Ui w) ([0,1 \XB fc n , £)) + A ]T || v Vi Hi* 

V V / / 

= /A (A”) + ^ A"* 
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Where (tj, yf) j means coordinate (t,. yf) is associated to center /}” in the sense that ft, y) ~ n j o 
j = argrnin ?;=1 fc y — /i'' (t)| (and if the minimum is not uniquely achieved then we take the smallest 

j such that j G argmin i=1 k \y — pfl{f)\). If we can show that ([0,1] x B (c n , |)) is bounded 
away from zero, then the result follows. 

Since we assumed e\ has unbounded support on W 1 if we can show that \c n \ < M for a constant M 
and n sufficiently large (a.s.) then we can infer the existence of a subsequence such that 


liminf P^ ( [0,1] X B (c n , = lirn P<£> ( [0,1] x B (c n 

n—>-oo \ \ 4 J J m —>00 \ \ 


and c nm converges to some c. This implies (after applying Fatou’s lemma for weakly converging mea¬ 
sures [21, Theorem 1.1]) 


liminf P<“> ([0,1] x B (<*, |) ) > Jjm P&> ([», 1] X B 0) 


> P ( [0,1] x B ^c, - 

l \y-c\<^Y{y\t)(t)T{t) dydt. 

By Assumption 3 and the continuity in Assumption 1, there exists e' > 0 such that 4>y{y\t) > e' for all 
y G [— M, M] d and t G [0,1], Hence we may bound the final expression above by 

r illf , [ l d V dt ^ e ' Vo1 ( B f°> 7 

ce[-M,M}J 0 jRd C|S 4 V V 4 


We arc left to show such an M exists. Assume there exists M k -1 such that for all j / j* we 
have ||jii"||#s < M k _\. By the Sobolev embedding of H s into L°° there exists a constant C such that 
||H|l°° < C"||/i||fp f° r all l l € H s . And therefore |q”(f)| < C'M k -\ for all j / j* and t G [0,1], Let 
C = C'M k _i + 8 then it follows that there exists c n G [0, C] d such that /<7 (t) = c n and ji n G 0. 

Now if no such M k ~i exists then there exists a second cluster such that |/;'■'*» || h s —>• oo where 
j** / j*. By the same argument 


liminf (/<"V) - > 0 


/FV) - /ttWWf 


,3 ) — 


- 16 n 


B 


Cn i - 


- — pH [ B 
16 n 1 


° n ’ 4 


for a constant c' n . By induction it is clear that we can find M/._/ such that k — l cluster centers arc 
bounded. The result then follows. □ 


Remark 3.5. Note that in the above theorem we did not need to assume a correct choice ofk. If the true 
number of cluster centers is k' and we incorrectly use k k', then the resulting cluster centers are still 
bounded. In fact for all the results of this paper the correct choice ofk is not necessary: although the 
minimizers of f 00 may no longer make physical sense, the problem is still robust in that the conclusions 
of Theorems 3.1 and 4.1 and Corollary 4.2 hold. 


4 Weak to Strong Convergence 

We now strengthen the result of the previous section and show that in fact (upto subsequences) conver¬ 
gence of minimizers is strong in H s . Our proof is based on the methodology Pollard used for proving the 
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central limit theorem for the A;-means method in Euclidean spaces [36]. In Pollard’s proof he assumed 
a positive definiteness condition on the second derivative of, what we call in this paper, f^. Under an 
analogous condition we arc also able to give a rate of convergence on convergent sequences of minimiz¬ 
es. Whether this condition holds will depend on the interplay between the integral over the boundaries 
of each partition and the size of each partition. 

We state the main results of this section now but leave the proofs to the end. 

Theorem 4.1. Define f n ,foo '■ © —>• M, where 0 is given by (5), by (2) and (3), respectively. Let 
{p n } n( =N C 0 where p n minimizes f n . Let p n ™ be any subsequence that weakly converges almost 
surely to some p°° then under Assumptions 1-4 we have that, after passing to a further subsequence, 
p nm converges to p°° strongly in H s and in probability. 

Corollary 4.2. If in addition to the conditions in Theorem 4.1 and where p°° is a minimizer of we 
assume that there exists p > 0 and n > 0 such that 

*9—/oo (/T ^0 A k|| i'll 

for all p with \\p — p°°\\ < p. Then any sequence p n of minimizers with p n —>• p°° in II s obeys 

the rate of convergence 

= o„ (I) . 

For clarity, we will assume that the entire sequence weakly converges in the remainder of this 
paper to avoid reference to subsequences. Relaxing this assumption is trivial, but notationally cumber¬ 
some. 

We let Y n (p ) = \fn( f ri (p) — foo(p)) an d then, by Taylor expanding around p°°, we have 
Y n (p n ) = Y n (p°°) + dY n (p°°-p n - p°°) + h.o.t. 

In Lemma 4.6, using Chebyshev’s inequality, we bound the Gateaux derivative of Y n in probability. 
Similarly one can Taylor expand around p°°. After some manipulation of the Taylor expansion, 
where we leave the details until the proof of Theorem 4.1, one has 

dLfoo (M°°; T n - P°°) < fn(d n ) - + O p \p n - /r°°||( L2)fe ) • 

We note that f n (p n ) - f n (p °°) < 0. We also show that 2A|| V s r/||^ 2)fc - 2||i/||2 ioo)fc < dffodp 00 ] v). 
Therefore 

A||V* ur - ny 2) , < o p (2=||^ _ ^n (L1) , + v _ „oo|| 2 ico) ^ 

The above expression allows us to convert weak convergence into strong convergence. Lemmata 4.3 
and 4.5 provide the first Gateaux derivative and a lower bound on the second Gateaux derivatives of 
respectively. 

Lemma 4.3. Define by (3) and 0 C (H s ) k for s > 1 by (5). Then, under Assumptions 1, 2 and 4, 
for /j£0fl ( L°°) k , v £ (H s ) k we have that f 00 is Gateaux differentiable at p in the direction v with 

9/oo(p; v) =2 [ [ - y) ■ Vj(t,y)(t)<h(y\t)<l>T(t) d V dt 

Jo JR d 

k 

+ 2A]T(VV,,W j) 

3 =1 
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where j(t, y ) is chosen arbitrarily from the set argmin,,- | y — Pj{t) |, so that 


j(t , y) € argmin | y - Pj(t)\. 
3 


(V) 


Remark 4.4. Since pj are continuous the boundary between each element of the resulting partition 
is itself continuous and has Lebesgue measure zero. The set on which j(t,y ) is not uniquely defined 
therefore has measure zero. Hence we will treat j(t. y ) as though it was uniquely defined. 

Proof of Lemma 4.3: Fix // £ 0, v £ (H s ) k and r > 0. We will assume d > 2. The case when 
d = 1 simplifies as the boundaries between partitions arc points and so we exclude the argument. Let 
f3 = — a ]^ 2 > j rd where e > 0 is chosen sufficiently small so that 1 — (3 = 
e < — (a + d + 3)). Then 


- a alt+d e > 0 ( true for an y 


ci 


\y\ 4> Y(y\t)dy < 

r|>?—0 T* 


\y\ 2+a d y 


'I >7— P 


t 


2+a+d-l 


d t for some c > 0 


r .L.-6 


or T 2 T d 


—/^( a +2+d) — 1 


Since a + 2 + d < 0 and — j3(a + 2 + d) — 1 = e > 0 the above converges to zero as r 
one can show ± f\ y \> r -p dy -> 0 as r -> 0 . 

Define j r (t, y) by 

jr(t, y) = argmin | y - pj(t) - rvj(t)\. 


( 8 ) 

0. Analogously, 


Then for (t, y) in the interior of the partition associated with p 3 we have 


j r (t , y) = j(t. y) for r sufficiently small. 


More precisely consider two points y\,y 2 € M' / , with \y\ — y-fi > 6 and let li yi , /2 be the boundary 
defined by 

Byi,y 2 = [v e -6(0, M) : | y - yf = | y- y 2 1| 

for a constant M > 0. Let y\ € B(y\,Cr) and y 2 £ B(y 2 , Cr). We will denote by djj the Hausdorff 
distance between sets in W 1 . in particular we wish to bound d//(/i yiy2 , /i yi y 2 ). Elementary geometry 
implies that this can be bounded by the Euclidean distance between points on the boundary of each set, 
in particular 

dHiBy^By^) < 4 H (9 B y 2 , 8 B 'jjj 


where 

dB yi y2 = jy G M. d : \y\ = M and \y - yi| = \y - y 2 |} • 

Without loss of generality assume that B yi m c {x : x\ = 0}. (All assumptions other than 4 arc rota¬ 
tion and translation invariant, whilst 4 is rotation invariant it is not translation invariant as the constant 
ci could increase with the size of the translation. However the cluster centers arc bounded in L°°, so in 
particular the size of the translation can be bounded. Therefore, up to redefining the constant ci, all the 
assumptions hold in the rotated and translated coordinate system. For d > 3 we consider a cross section 
at X 3 : d = a € M d “ 2 , then there exists constants 71,72 € M (depending on a) such that x\ = y\x 2 + 72 
parametrizes the set {x € /i yi , y2 : x^-.d. = «} (for a > M the set is empty and we have nothing to 
prove). Let 0 a = | tan -1 71 1 £ [0, f\ be the angle between the lines x\ = 0 and x\ = 71 X 2 + 72 - When 
d = 2 the set /i yi y2 is already a straight line in M 2 and it is unnecessary to take a cross section (i.e. x: i:( ] 
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Figure 2: The geometry considered in the proof of Lemma 4.3 admits two cases: in the first (left) the intersection 
of l and l lies between y i and in the second (right) it does not. 


is null and 9 a is independent of a). We will find 0* such that sin$* = 0(r ) and sup a 9 a < 9* then we 
can bound the Hausdorff distance by 

d,H{dB yit y 2 ,dBy lt y 2 ) < rC + 2 M sin#* = 0(r), 

the above bound holding as it is the maximum distance that can arise from rotation plus the maximum 
possible translation of the set dB yi m . 

Let l be the ray through y\ and ij 2 and l be the ray through y\ and ij 2 - Let P be the point of 
intersection between i and t. The point P exists if and only if the lines £ and i are not parallel. The lines 
i and £ arc parallel if and only if 9 = 0, trivially any choice of 0* > 0 will bound this case. Therefore 
we assume that 9 > 0 and therefore the point P exists. 

One can easily show that yiPyi = 9 (the angle between the lines y\ P and Py\ is 9). There arc two 
possibilities, either (1) P is between y\ and ij 2 or (2) it isn’t. 

In the second case we assume that \y 2 — P| < \y\ — P\ and therefore y\ — P\ > 5. Let 0 be the 
closest point on £ to y\ (see Figure 2). So, P, yi, Q form a triangle with PQyi = \, QPyi = 9 and 
IQ - 2 /t| < \yi - yi\ < Cr. Hence sin 6 » = ^ 1 ■ 

The first case is similar. Assume that y i — P\ > 1 J 2 — P\ then \y\ — P\ > |. Let Q be the 
closest point on £ to y\ then \Q — y\\ < \y\ — yi| < Cr and QPy\ = 9, y±QP = f. In particular 
sin« = < 2 ^. 


In both cases sin 9 < ^- 


which implies 


dll{By lt y 2 , By lt y 2 ) < (1 }J ( 8By x ) y 2 , 8B yi ) < T C + 


4 MCr 


Let 


B(t) = | y G R d : j(t, y) is not uniquely defined | 
and X(r,t ) = jy € B(0,r~P) : dist (y,B(t)) < H^H(^oo)fc + 4r ^. j |. By the previous calcula¬ 


tion with C = |M|(Loo)fc and M = r if j r (t,y ) / j(t,y) then dist (y,B(t)) < rC + 


4 Mr 


4 r l-/3 


(L°°) k y r + ) ■ And therefore if y 0 X(r, t ) then j r (t, y) = j(t, y). 

We now partition X(r,t) into [2r -/3_1 ] subsets (where \t] is the smallest integer greater than or 


equal to t) by defining 


K,v, = \y^ B . 


J yi,y2 


y t + 2/2 


€ \{m — 1)?’, mr\ 
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and 


X m {r,t) = jy € X(r,t) : 3i,j with dist(y, , (t) ) < ^2||i/|| (ioo) k ( r + 

and dist(y, < dist(y, ££' (t))W(t) ) for all m' + m j. 

So X(r,t ) C uj^X ^X m (r,f) (assuming r is sufficiently small so that |M|(x,oo)k < r _/3 ). This 
implies 


'| y |<' r_ ' 3 


(2y - Vj(t,y)(t) ~ Vjr(t,y)(t)) ■ (Vj(t,y)(t) ~ Vj r (t,y)(t)) <fr(v\t ) dy 


(2y - Hj(t,y)(t) - A*j r (t,!/)(*)) • - Hjr(t,y)(t)) <h(v\t) d V 

/| ^ 


' X(r,t) 

\2r~P~ 1 

<2 X (mr+ ||^||( L oo)fc (r + 


ra=l 



X m (r,t) 


I Vj(t,y)(t) - f*Mt,y)(t)\<f>Y(y\t) dy. 


Now if y G X m (r, t ) then 
where ||/x|| (L oo'jk < A. In particular 


u 2 


<My|£) < 


> (m—l)r for some i,j and therefore |y| > (to— I )r—A 
ci(m — 1 — ^4) a if m > A + 1 


L°° 


else. 


Note that 


Vol(X m (r,t)) < - 1) [Vol d _i(5(0, mr)) - Vol d -i(S(0, (to - l)r)] 


4 r t-d 

l i/ ll(L°°) fc ( r “I-^— 


< m d ~ 1 r d ~^ - 


Therefore 


'\y\<r~P 


(2 y - Vj(t,y){t) - f*Mt,v)(t)) ■ {Vj(t,y)(t) - l*j r (t, y )(t)) </> v(y\t ) dy 


< 


< 


2 IIHI(z,°°) fc 


[2r“ 


m=l 


X] ( mr + IMI(L°°)* (r + 





Xm(r,t) 


4> Y {y\t)dy 


2 llMll(L°°)fc||</ , y||L° 


X] fmr + ||i/||( L oo)* ( r + )) Vol(X m (r,f)) 

m= 1 ' ' ' ' 


m= 

— 0 — 1 


+ 

A+l 


W } X ^W.r + |H|( L oo)fc ^ (m - 1 - A) a Yo\(X m (r, t)) 


[2 r 


< - X ( rm + rl /3 ) 777 - d ^ /3 + - X ( rm + 7-1 /3 )( m — 1 — A) a m d l r d 13 


m= 1 


ra=A+2 


< r ^-2/3 _|_ r d-2/3 "^2 m d + a 


m= 1 


= 0(r d - 2/3 ) 
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with the above following as r d ~P is dominated by r rf_2/3 as r —>• 0. Since d — 2/3 > 2(1 — (3) > 0 then 
the above is o(l). 

Hence 

- I [ | y- Hjr(t,y){t )| 2 - | y- Hj(t,y)(t) | 2 4>r(y\t) dy 


^ \J\v\<r-P 


|V - Pjr(t,y)(t) | 2 - \y - dj(t, y )(0 | 2 ‘MyK) dy + o(l) by (8) 

(2y - <fr(y\t) dy + o(l) 


which converges (uniformly in f) to zero. 
Therefore 


5/oo(^; u) = lim 

r-> 0 


/00O + ™') - /oo(a 0 


= lim - < / / 

r_5>0 r J 0 J R d 


b ^jr(t,y) (^) I b (^) I d - f bjrh,!/) (^) I 


- 2r (y - ■ vj r (t,y)(t ) <My|^)<M0 dydt 


= -2 


+ A^(2r(V a i/ i ,Wi)+r 2 ||VV i ||i a ) 


(y - yj(t,?,)(*)) • dydt 


+ 2A^(VV i ,Wi) 

1=1 

by the dominated convergence theorem. □ 

Lemma 4.5. Under the same conditions as Lemma 4.3 we have 

d 2 -foo{p,v,v) > 2A||V'V||( i2)fc - 2||z/|| 2 ioo)fc . 

Proof. The proof is similar to that of Lemma 4.3 so we only sketch the details. The key step is in 
showing the following limit converges to zero 


lim sup - / 

r—>0 r Jo 


{ (dMt,y)(t) - y ) • VMt,y)(t) - - y) • ^(t, y ) (*)} ^r(y|£)<M0 dy d t 


< 2||//||(ioo)fc|b||(£,oo)fc lim sup- 


r^-0 ^ Jo Jjr^j 


(/>Y(y\t)(h(t) dy d t 


+ 2 IMI( j^oo^k lim sup — 


\y\(h r (y\t) ( h(t) dy dt. 


r^-0 ^ J 0 Jjr^j 

As in the proof of Lemma 4.3 we divide M d = B(0,r~P) U (R d \ B(0,r~P)) and recall that X(r,t ) 

contains the set where j r {t,y ) f j(t,y ) in the ball B( 0, r _/3 ) and X(r,t) C uj^ ^X m (r, f) with 
Vol(X m (r, f)) = 0{m d ~ l r d ^^). The limit 




(|y| + 1) l f > Y(y\t)<f>T(t) dy df -A 0 
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as in the proof of Lemma 4.3. Now, 


' jr¥^j,\y\<r-P 


(\y\ + i)<My|0 d y 


< [ (\y\ + dy 

J X(r,t) 


< 


A+l n, 

V / (\y\ + l)<f>Y{y\t) dy + ci 

m =1 d x m{r,t) 


\2r-P~^ 

E / (|y| + 1 )lyr d y 

m=A +2 dx m{r,t) 


A+i r 2r ^ J i 

< E ll‘MU oo 0 4 + mr + l)Vol(X m (r,f)) + ci E ( m ~ A ) ( m _ 1 - Vol(A' m (r,f)) 

m=l m=A+2 

= 0(r d ~ p ). 

Since d— (3 >2 — /3 > 1 then the above limit is o(r). □ 

We now consider Y n . In particular we want to bound dY n (p °°: p n — p °°). 

Lemma 4.6. Define f n , U : 0 —> R by (2) and (3) respectively where 0 is given by (5). Take 
Assumptions 1, 2 and 4 and define 


Y n : 0 -4- R, Y n (p) = y/E (f n {p) - fUU • 

Then for p E 0, u € ( H s ) k we have that Y n is Gateaux differentiable at p in the direction v with 


( rl 

dY n (p-,u ) = 2 y/n ( 

\J o 



{y - )) • d v dt 


l 


n 


El (y* yj(ti,yi)(U)) • v j(ti,yi)(U 


i= 1 


where j(t, y) is defined by (7). Furthermore, for a sequence u n with 


\\u n \\ {L 2 )k = o p (l) and \\u n \\ {Hs) k = O p (l) 
we have dY n (p ; i/ n ) = O p (||i'"||(L2)k)- 

Proof. Calculating the Gateaux derivative is similar to Lemma 4.3 and is omitted. By lineality and 
continuity of dY n we can write 


dY n 



u n \ 

v n \\(DY J 


(v n ,e m ) 

E TT^rii- oY n (p-, e r 


where e m is the Fourier basis for (L 1 ) k (we assume e m = (e mi ,..., e mk ) where e m is the Fourier basis 
for L 2 ). Let V m = E ( dY n (p ; e m )) 2 and Z,- L = (y* - Pj[t. uyi )(U)) ' U{t uyi p then 

Hn = ^E( Z * " 

= 4E (Zi - EZi) 2 

( k k \ 

4 EEII^-^IU-+ E Ig| 2 J =:C. 


20 



By Assumptions 1 and 2 and since /r £ (L°°) k (by the embedding of (H s ) k into (L°°) k )) C is finite. 
Therefore, 


dY n /i; 


1 II (Z/ 2 )* 


> M < —E 
~ M 


Which implies ()Y n ji 


dY n /i; ■ 


1 II (L 2 )* 


by Markov’s inequality 


< y IK ; e m )| ^ E (| | 

“ ^ ||i/ n || (L 2 )fc M Vl JU 

< AT b y Holder’s inequality 


MI^l 2 )* M 


< 


M ' 


1 ( 1 - 2 )* 


= O p ( 1 ). 


□ 


We now have the necessary pieces in place to prove Theorem 4.1 and Corollary 4.2. 

Proof of Theorem 4.1. By Theorem 3.1 we have that (up to subsequences) || fi n — A^ 00 1| (z- 2 ) fc = o p (l), 
Hf” - F°°ll(L°°)fc = o P ( 1 ) and ||^ n || (//s) * = O p ( 1 ). 

By Theorem 2.5, for some t £ [0,1], we have 

/oo(f”) > foo{p°°) + 3/oo (p°°',p n - P°°) + \&-fao ((1 - %°° + t» n \p n - F°°) 

> foo(p°°) + 2A||V s (q n - /i°°)||^ )fc - 2||/i n - ^||J ioo)fc 

after applying Lemma 4.5 and since /r 00 minimizes /oo the first derivative must be zero. 

Similarly, and using Lemma 4.6, 

Yn{p n ) = Y n (jj°°) + Op ( dY n - O) = Y n (n°°) + O p (|| /i n - fi °°|| (i2)fc ) . 

From the definition of Y n we also have 

fn(p n ) = /oo(f”) + 


Substituting into the above we obtain 

fn(p n ) > foo(p°°) + —j=Y n (n n ) + 2A||V S - q°°) - 2||/x" - /r°°||^ )fc 


= /oo(/O + ^n(O + 0 J> 


ll/^-Z^V)* 


n 


^ + 2A||V s (M n -OII( l 2 )* 


- 211 ^-^ 11 ^ 


= /^F 00 ) + O p 




n 


+ Ilf” - f°°ll(i»).) + 2 A||V‘ („” - f“) lljy,. 


Rean'anging and using /'„(//") < f n (p°°) we have 


'll/^-t^V)* 


2A|| V s - M °°) ||’ L2)fc < (/ n (^) - /„(M°°)) + Op + I If” - F°°ll(Loc )fc 


<o p 


IIf” -F°°II(L 2 )* || n ,00112 A 

+ \\fl fl ||(£oo)k j ■ 


We have shown, via Theorem 3.1, that ||V s (//' — q°°) ||{z, 2 ) fc "A 0 and therefore //" —>• /t strongly in 
FF and in probability. □ 
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Proof of Corollary 4.2. The proof is similar to the proof of Theorem 4.1 since 


foo{p n ) > foo(p°°) + d/oo (a* 00 ; p n - t°°) +/'(!- t)^/oo ((1 - %°° + T n - p°°) 

Jo 

> foo(p°°) + K\\p n - p°°\\\ H s )k - 

One can then show 


foo{p n ) - foo(p°°) = fn{p n ) - =r n (p n ) - /oo(p°°) 


= un - foo{p°°) - ^=Y n (v °°)+ o. 


n 


\\fi n — /X°°||(£,2)fc 


n 


= /nOO - Min + O p 

\\p n - p°°\\ (L 2 )k 


\\p n - P°°\\ { L^ 


n 


<O p 


n 


Hence, 


9 / \\p n — T°°\\lL 2 ) k 

4p n - /HIIK - /HI W < 4p n - /HI 2 ( „s )k < O p ^ ^-" (L) 

Dividing by |//" — /HI(L 2 ) fc completes the proof. 
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